Indoor scenes typically exhibit complex, spatially-varying appearance from global illumination, making inverse rendering a challenging ill-posed problem. This work presents an end-to-end, learning-based inverse rendering framework incorporating differentiable Monte Carlo raytracing with importance sampling. The framework takes a single image as input to jointly recover the underlying geometry, spatially-varying lighting, and photorealistic materials. Specifically, we introduce a physically-based differentiable rendering layer with screen-space ray tracing, resulting in more realistic specular reflections that match the input photo. In addition, we create a large-scale, photorealistic indoor scene dataset with significantly richer details like complex furniture and dedicated decorations. Further, we design a novel out-of-view lighting network with uncertainty-aware refinement leveraging hypernetwork-based neural radiance fields to predict lighting outside the view of the input photo. Through extensive evaluations on common benchmark datasets, we demonstrate superior inverse rendering quality of our method compared to state-of-the-art baselines, enabling various applications such as complex object insertion and material editing with high fidelity. Code and data will be made available at \url{https://jingsenzhu.github.io/invrend}.
translated by 谷歌翻译
在存在未衡量的混杂因素的情况下,我们解决了数据融合的治疗效应估计问题,即在不同的治疗分配机制下收集的多个数据集。例如,营销人员可以在不同时间/地点为相同产品分配不同的广告策略。为了处理由未衡量的混杂因素和数据融合引起的偏见,我们建议将观察数据分为多组(每个组具有独立治疗分配机制),然后将组指标显式地模拟为潜在的组仪器变量(LATGIV),将其模拟为实施基于IV的回归。在本文中,我们概念化了这种思想,并开发了一个统一的框架,以(1)估计跨群体观察到的变量的分布差异; (2)对不同治疗分配机制的LATGIV模型; (3)插入latgivs以估计治疗响应函数。经验结果证明了与最新方法相比,LATGIV的优势。
translated by 谷歌翻译
节流是当今在线广告市场中最受欢迎的预算控制方法之一。当一个受预算受限的广告商雇用节流功能时,她可以在广告平台建议出价后选择是否参加拍卖。本文重点介绍了从理论观点重复的第二价格拍卖中的动态预算节流过程。潜在问题的一个重要特征是,广告商不知道进入市场时竞争最高的出价。为了模拟消除这种不确定性的困难,我们考虑了两种不同的信息结构。广告商可以通过全信息反馈获得每轮竞争最高的投标。同时,通过部分信息反馈,广告商只能在她参加的拍卖中获得最高竞争的出价。我们提出了OGD-CB算法,该算法涉及在线广告查询面临的同时分配学习和收入优化。在这两种情况下,我们都证明该算法保证了$ O(\ sqrt {t \ log t})$遗憾,概率$ 1- o(1/t)$相对于流体自适应节流基准。通过证明$ \ omega(\ sqrt {t})$的下限在最小的后悔中,即使是最佳的最佳选择,我们就建立了算法的近乎最佳性。最后,我们将节流的最佳流体最佳与起搏相提并论,这是另一种广泛采用的预算控制方法。这些基准的数值关系使我们对不同的在线算法进行预算管理的比较有了进一步的见解。
translated by 谷歌翻译
一个良好的动作效果预测模型,称为环境模型,对于在机器人控制,推荐系统和患者治疗选择等许多领域中实现样本有效的决策政策学习非常重要。我们可以使用这种模型进行无限的试验来确定适当的行动,以便可以节省现实世界中的查询成本。它要求模型正确处理看不见的数据,也称为反事实数据。但是,标准数据拟合技术不会自动实现这种概括能力,通常会导致不可靠的模型。在这项工作中,我们在模型学习中引入了反事实风险最小化(CQRM),以推广到特定目标策略查询的反事实数据集。由于目标策略在政策学习中可能是各种各样且未知的,因此我们提出了一个对抗性CQRM目标,其中模型在对抗性策略查询的反事实数据上学习,并最终得出可拖延的解决方案Galileo。我们还发现,对抗性CQRM与对抗模型学习密切相关,从而解释了后者的有效性。我们将伽利略应用于综合任务和现实应用程序中。结果表明,伽利略对反事实数据做出了准确的预测,从而显着改善了现实世界测试的策略。
translated by 谷歌翻译
Generalizability to unseen forgery types is crucial for face forgery detectors. Recent works have made significant progress in terms of generalization by synthetic forgery data augmentation. In this work, we explore another path for improving the generalization. Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forgery types. Specifically, in our method, a teacher network takes as input the face images and generates an attention map of the deep features by a diverse multihead attention ViT. The attention map is used to guide a student network to focus on the low-attended features by reducing the highly-attended deep features. A deep feature mixup strategy is also proposed to synthesize forgeries in the feature domain. Experiments demonstrate that, without data augmentation, our method is able to achieve promising performances on unseen forgeries and highly compressed data.
translated by 谷歌翻译
In this work, we investigate improving the generalizability of GAN-generated image detectors by performing data augmentation in the fingerprint domain. Specifically, we first separate the fingerprints and contents of the GAN-generated images using an autoencoder based GAN fingerprint extractor, followed by random perturbations of the fingerprints. Then the original fingerprints are substituted with the perturbed fingerprints and added to the original contents, to produce images that are visually invariant but with distinct fingerprints. The perturbed images can successfully imitate images generated by different GANs to improve the generalization of the detectors, which is demonstrated by the spectra visualization. To our knowledge, we are the first to conduct data augmentation in the fingerprint domain. Our work explores a novel prospect that is distinct from previous works on spatial and frequency domain augmentation. Extensive cross-GAN experiments demonstrate the effectiveness of our method compared to the state-of-the-art methods in detecting fake images generated by unknown GANs.
translated by 谷歌翻译
The success of deep learning is partly attributed to the availability of massive data downloaded freely from the Internet. However, it also means that users' private data may be collected by commercial organizations without consent and used to train their models. Therefore, it's important and necessary to develop a method or tool to prevent unauthorized data exploitation. In this paper, we propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners. Specifically, the noise produced by the generator for each image has the confounder property. It can build spurious correlations between images and labels, so that the model cannot learn the correct mapping from images to labels in this noise-added dataset. Meanwhile, the discriminator is used to ensure that the generated noise is small and imperceptible, thereby remaining the normal utility of the encrypted image for humans. The experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets. The results demonstrate that our method not only outperforms state-of-the-art methods in standard settings, but can also be applied to fast encryption scenarios. Moreover, we show a series of transferability and stability experiments to further illustrate the effectiveness and superiority of our method.
translated by 谷歌翻译
Google,Amazon和Microsoft等提供商提供的商业ML API已在许多应用程序中大大简化了ML的采用。许多公司和学者都为使用ML API用于对象检测,OCR和情感分析等任务。处理相同任务的不同ML API可能具有非常异构的性能。此外,API的基础模型也随着时间的推移而发展。随着ML API迅速成为一个有价值的市场,并且是消耗机器学习的广泛方式,因此系统地研究和比较不同的API并表征API随时间变化的方式至关重要。但是,由于缺乏数据,目前该主题目前没有被忽视。在本文中,我们介绍了HAPI(API的历史),该数据集由1,761,417个商业ML API应用程序(涉及来自亚马逊,Google,IBM,Microsoft和其他提供商的API),包括图像标签,文本识别和文本识别和文本识别和文本,从2020年到2022年的挖掘。每个实例都由API的查询输入(例如图像或文本)以及API的输出预测/注释和置信分数组成。 HAPI是ML API使用情况的第一个大型数据集,并且是研究ML-AS-A-Service(MLAAS)的独特资源。作为HAPI启用的分析类型的示例,我们表明ML API的性能会随着时间的流逝而大幅变化 - 在特定基准数据集上删除了几个API的精度。即使API的汇总性能保持稳定,其误差模式也可以在2020年至2022年之间在不同的数据子类型中转移。这种更改可能会大大影响使用某些ML API作为组件的整个分析管道。随着时间的流逝,我们进一步使用HAPI研究人口亚组的商业API绩效差异。 HAPI可以刺激MLAA的不断发展领域的更多研究。
translated by 谷歌翻译
在马尔可夫决策过程(MDP)中,可能存在不可观察的混杂因素并对数据生成过程产生影响,因此经典的非政策评估(OPE)估计器可能无法识别目标策略的真实价值函数。在本文中,我们研究了与可观察的仪器变量混杂的MDP中OPE的统计特性。具体而言,我们根据仪器变量提出了一个两阶段估计器,并在具有线性结构的混杂MDP中建立了其统计属性。对于非反应分析,我们证明了一个$ \ Mathcal {o}(n^{ - 1/2})$ - 错误绑定了$ n $是样本的数量。对于渐近分析,我们证明了两阶段估计量在渐近正常上,典型速率为$ n^{1/2} $。据我们所知,我们是第一个通过仪器变量显示混合线性MDP的两阶段估计量的统计结果。
translated by 谷歌翻译
我们提出了一种保护生成对抗网络(GAN)的知识产权(IP)的水印方法。目的是为GAN模型加水印,以便GAN产生的任何图像都包含一个无形的水印(签名),其在图像中的存在可以在以后的阶段检查以进行所有权验证。为了实现这一目标,在发电机的输出上插入了预先训练的CNN水印解码块。然后通过包括水印损失项来修改发电机损耗,以确保可以从生成的图像中提取规定的水印。水印是通过微调嵌入的,其时间复杂性降低。结果表明,我们的方法可以有效地将无形的水印嵌入生成的图像中。此外,我们的方法是一种通用方法,可以使用不同的GAN体系结构,不同的任务和输出图像的不同分辨率。我们还证明了嵌入式水印的良好鲁棒性能与几个后处理,其中包括JPEG压缩,噪声添加,模糊和色彩转换。
translated by 谷歌翻译